System and method for collection of a website in a past state and retroactive analysis thereof

ABSTRACT

A system and method for collection of a website in a past state and retroactive analysis thereof are provided. The method includes collecting, from a repository, at least one session replay; identifying, in the at least one collected session replay, at least one main state, wherein a main state is a portion of a session replay; selecting at least one webpage snapshot corresponding to a respective main state of the at least one identified main state, wherein each snapshot is a single-instant webpage state at a specific point in time; identifying, in the at least selected one snapshot, at least one webpage zone; and returning the at least one identified zone.

TECHNICAL FIELD

The present disclosure relates generally to webpage zone analysis and,in particular, to systems and methods for collection of a website in apast state and retroactive analysis thereof.

BACKGROUND

As retailers, service providers, and other operators of online platformscontinue to expand and improve the content made available throughuser-facing web services, the complexity of the webpages grows as well.The continued development of web-focused content provides site operatorswith additional avenues by which to connect with users, but alsoincreases the likelihood of web errors. Such web errors may causesignificant challenges for website operators, reducing user engagement,revenues, and the like, which may, in turn, harm the operator'sbusiness. As a result, website owners, operators, and administrators mayseek to better understand the development of a webpage throughout itslifecycle. Further, the same interested parties may seek means toisolate fully-functional webpage versions in order to improve upon thesuccesses thereof.

While a variety of factors may be of interest to webpage owners,operators, and administrators seeking to improve site performance, theisolation of key factors may provide a streamlined web-improvementprocess. In order to understand which elements of a webpage function asintended and which do not, interested parties may seek zone-analysistools in order to easily discretize and identify the components of awebpage. While zone analysis may be achieved manually, by collection andlabelling of individual webpage zones or elements, such a manual processmay be time-consuming where a webpage includes many content elements orzones. In addition, such manual solutions may require more time oreffort than a webpage owner, operator, or administrator may wish toallot to zone analysis, particularly where multiple pages are to beanalyzed. Further, such manual techniques may be applicable only to acurrent, “live,” version of a webpage, limiting a party's ability toconduct manual zone analyses for previous versions of a webpage.

In addition to the noted difficulties of applying manualzone-identification solutions, such solutions fail to provide for theidentification of fully-functional webpages. As a webpage may includemultiple faulty features, such as buttons which may be clicked to noeffect, links with no destination, and the like, analysis ofnon-functional webpages may frustrate those seeking to improve siteperformance. As a result, solutions providing for automatic detection offunctional webpage versions, from a set of webpage versions, may bedesirable in order to improve the efficiency of webpage zone analysis.However, the described manual zone-identification solutions fail toprovide for automatic detection of fully-functional webpage versions.

It would therefore be advantageous to provide a solution that wouldovercome the challenges noted above.

SUMMARY

A summary of several example embodiments of the disclosure follows. Thissummary is provided for the convenience of the reader to provide a basicunderstanding of such embodiments and does not wholly define the breadthof the disclosure. This summary is not an extensive overview of allcontemplated embodiments and is intended to neither identify key orcritical elements of all embodiments nor to delineate the scope of anyor all aspects. Its sole purpose is to present some concepts of one ormore embodiments in a simplified form as a prelude to the more detaileddescription that is presented later. For convenience, the terms “someembodiments” or “certain embodiments” may be used herein to refer to asingle embodiment or multiple embodiments of the disclosure.

Certain embodiments disclosed herein include a method for collection ofa website in a past state and retroactive analysis thereof. The methodcomprises collecting, from a repository, at least one session replay;identifying, in the at least one collected session replay, at least onemain state, wherein a main state is a portion of a session replay;selecting at least one webpage snapshot corresponding to a respectivemain state of the at least one identified main state, wherein eachsnapshot is a single-instant webpage state at a specific point in time;identifying, in the at least selected one snapshot, at least one webpagezone; and returning the at least one identified zone.

In addition, certain embodiments disclosed herein include a system forcollection of a website in a past state and retroactive analysisthereof. The system comprises: a processing circuitry; and a memory, thememory containing instructions that, when executed by the processingcircuitry, configure the system to: collect, from a repository, at leastone session replay; identify, in the at least one collected sessionreplay, at least one main state, wherein a main state is a portion of asession replay; select at least one webpage snapshot corresponding to arespective main state of the at least one identified main state, whereineach snapshot is a single-instant webpage state at a specific point intime; identify, in the at least selected one snapshot, at least onewebpage zone; and return the at least one identified zone.

BRIEF DESCRIPTION OF THE DRAWINGS

The subject matter disclosed herein is particularly pointed out anddistinctly claimed in the claims at the conclusion of the specification.The foregoing and other objects, features, and advantages of thedisclosed embodiments will be apparent from the following detaileddescription taken in conjunction with the accompanying drawings.

FIG. 1 is an example network diagram depicting a network systemdisclosing the embodiments for collection of a website in a past stateand retroactive analysis thereof.

FIG. 2 is a flowchart depicting a method for retroactive zoneidentification, according to an embodiment.

FIG. 3 is a flowchart depicting a method for generating session replays,according to an embodiment.

FIG. 4 is a flowchart depicting a method for identifying website mainstates, according to an embodiment.

FIG. 5 is a diagram depicting an unlabeled document object model (DOM)tree, according to an embodiment.

FIG. 6A is an illustration depicting a retroactive zoning analysisrequest tool, according to an embodiment.

FIG. 6B is an illustration depicting a snapshot selector, according toan embodiment.

FIG. 7 is an illustration of a zoning analysis presentation platform,according to an embodiment.

FIG. 8 is a schematic diagram of an analytic server, according to anembodiment.

DETAILED DESCRIPTION

It is important to note that the embodiments disclosed herein are onlyexamples of the many advantageous uses of the innovative teachingsherein. In general, statements made in the specification of the presentapplication do not necessarily limit any of the various claimedembodiments. Moreover, some statements may apply to some inventivefeatures but not to others. In general, unless otherwise indicated,singular elements may be in plural and vice versa with no loss ofgenerality. In the drawings, like numerals refer to like parts throughseveral views.

FIG. 1 is an example network diagram depicting a network system 100disclosing the embodiments for collection of a website in a past stateand retroactive analysis thereof. The system 100 includes one or moreuser devices, 120-1 through 120-N (hereinafter, “user device” 120 or“user devices” 120), an analytic server 130, one or more web servers,140-1 through 140-N (hereinafter “web server” 140 or “web servers” 140),and a database 150. Further, in the system 100, the various componentslisted are interconnected via a network 110.

The network 110 provides interconnectivity between the variouscomponents of the system. The network 110 may be, but is not limited to,a wireless, cellular, or wired network, a local area network (LAN), awide area network (WAN), a metro area network (MAN), the Internet, theworldwide web (WWW), similar networks, and any combination thereof. Thenetwork may be a full-physical network, including exclusively physicalhardware, a fully-virtual network, including only simulated or otherwisevirtualized components, or a hybrid physical-virtual network, includingboth physical and virtualized components. Further, the network 110 maybe configured to encrypt data, both at rest and in motion, and totransmit encrypted, unencrypted, or partially-encrypted data. Thenetwork 110 may be configured to connect to the various components ofthe system 100 via wireless means such as, as examples and withoutlimitation, Bluetooth (tm), long-term evolution (LTE), Wi-Fi, other,like, wireless means, and any combination thereof, via wired means suchas, as examples and without limitation, ethernet, universal serial bus(USB), other, like, wired means, and any combination thereof. Further,the network 110 may be configured to connect with the various componentsof the system 100 via any combination of wired and wireless means.

The user devices 120 may be devices allowing a user to interact with thesystem 100 for purposes including, as examples and without limitation,providing sessions to the system 100 for analysis, receiving returns oroutputs from the system 100, configuring system 100 parameters, other,like, purposes, and any combination thereof. Further, a user device 120may be configured to receive returns or outputs from the web servers 140to view webpages or other content developed by any of the web servers140. A user device 120 typically includes a web browser (not shown) orany application (virtual, web, mobile, and the like) which allows a userto view, download, interact with, and engage with content provided bythe web servers 140, the analytic server 130, or both. Examples of userdevices 120 may be smartphones, personal computers, business systems,dedicated kiosks, tablet computers, and other, like, devices.

Users of the user devices 120 may access at least one website hosted bythe servers 140. The website may be, for example, an online retailplatform, an e-commerce platform, and the like. In some embodiments, theuser devices 120 can access an application installed on and executed bythe servers 140. Such an application may include a mobile application(app), a cloud application, a web application, and the like. The variousembodiments will be discussed herein with a reference to one or morewebsites, but are equally applicable to one or more applications.

In an embodiment, a user device 120 may be operated by an administratorof one or more websites hosted by the web server or servers 140. Throughthe user device 120, reports generated by the analytic server 130 may beviewed. The user device 120 may be further configured to allow forconfiguration of one or more components of the system 100, issuing orexecuting instructions, manipulating data, and the like.

The analytic server 130, depicted in detail with respect to FIG. 7,below, is a system configured to execute instructions, organizeinformation, and otherwise process data. The analytic server 130 may beconfigured to execute the methods described hereinbelow, other, like,methods, and any combination thereof. As described with respect to FIG.7, below, the analytic server 130 may include various processing,memory, networking, and other components allowing the analytic server130 to execute instructions and provide data processing. The analyticserver 130 may be implemented as physical hardware, as softwarevirtualizing physical hardware, or as a combination of physical andvirtualized components. The analytic server 130 may be connected to thenetwork 110 via those means described with respect to the network 110,above. The various processes performed by the analytic server 130 aredescribed in greater detail hereinbelow.

According to the disclosed embodiments, the analytic server 130 isconfigured to execute instructions for collection of a website in a paststate and retroactive analysis thereof. A website in a past state may beany prior version of one or more webpages, including complete andincomplete webpages, as well as various interaction data and sitemetrics associated therewith.

The web servers 140 may be one or more web sources of data other thanthe inputs received from the user devices 120. The web servers 140 mayinclude data relating to websites, data relating to webpages, other,like, data, and any combination thereof. Data from the web servers 140may be stored in the database 150 and may be processed by the analyticserver 130. Web servers 140 may be local web sources, remote websources, or any combination thereof. Examples of web servers 140include, without limitation, repositories of webpage information,repositories of webpage element or zone classifications, “live”webpages, other, like, sources, and any combination thereof. Web servers140 may be connected with the network 110 via the means describedhereinabove.

The database 150 is a data store configured to archive data permanentlyor semi-permanently. The database 150 may be configured to storeinformation received from one or more web servers 140, user devices 120,and other, like, components, as well as to store data relevant to theoperation of the analytic server 130 and any outputs therefrom. Thedatabase 150 may be a local system, a remote system, or a hybridremote-local system. Further, the database 150 may be configured as afull-physical system, including exclusively physical components, as avirtualized system, including virtualized components, or as a hybridphysical-virtual system. Examples of devices which may be configured asa database 150 in the system 100 include, without limitation, localdatabase hardware, cloud storage systems, remote storage servers, other,like, devices, and any combination thereof.

According to an embodiment, the database 150 may be configured to storeor otherwise archive data relating to detection, identification, andanalysis of webpage sessions including, without limitation, webpages,user interactions, user sessions, other, like, data, and any combinationthereof. Further, the database 150 may be configured to transfer, to andfrom the analytic server 130, data necessary for the execution of themethods described hereinbelow, and may store or otherwise archiveanalytic server 130 inputs, analytic server 130 outputs, or both.

FIG. 2 is an example flowchart 200 depicting a method for retroactivezone identification, according to an embodiment.

At S210, a retroactive analysis request is received. A retroactiveanalysis request is a request specifying retroactive analysis of one ormore webpages. The received retroactive analysis request may include oneor more uniform resource locators (URLs), one or more analysis date, oneor more metric of interest, one or more device view specification,other, like, data features describing one or more aspects of therequest, and any combination thereof. Further, a retroactive analysisrequest may be generated through a webpage, accessible through a webbrowser, as well as an application, and the like, where the web browser,the application, or both, may be installed on a user device, such as theuser device, 120, of FIG. 1, above. A retroactive analysis request maybe generated using a retroactive zoning analysis request tool, asdescribed with respect to FIG. 6A, below. A retroactive analysis requestmay be received from a device such as, as an example and withoutlimitation, a user device, such as the user device, 120, of FIG. 1,above.

In an embodiment, the retroactive analysis request received at S210 mayinclude one or more URLs specifying auto-anonymized webpages. Anauto-anonymized webpage is a webpage including information automaticallyanonymized by one or more anonymizing processes. An anonymizing processmay be configured to, as examples and without limitation, remove,redact, obfuscate, or otherwise anonymize information including, asexamples and without limitation, names, addresses, payment information,other, like, information, and any combination thereof. Where theretroactive analysis request includes one or more URLs specifyingauto-anonymized webpages, the auto-anonymized webpages may bepre-anonymized at the time of request receipt, or may be fully orpartially anonymized after receipt of the request, including byoperation of various features or processes as may be describedhereinbelow.

At S220, one or more session replays are collected. A session replayincludes collections of user interactions with one or more webpages of awebsite, describing the user's journey through the website. Sessionreplays may include any or all of the user's interactions with thewebpage or webpages between the time that the user connects to thewebpage or website and the time the user disconnects from the webpage orwebsite. Session replays may be generated according to various meansincluding, without limitation, the means described with respect to FIG.3, below. Further, collection of session replays at S220 may includegeneration of one or more session replay requests, as may be received atS310 of FIG. 3, below. Session replays may be, without limitation,webpages of a user's sessions collected over time, lists of interactionevents, other, like, data features, and any combination thereof. Thecollected webpages may be HTML webpages, DOM of such webpages, and thelike, as well as any combination thereof. Session replays may becollected from sources including, without limitation, memory or storagecomponents of an analytic server, such as the analytic server, 130, ofFIG. 1, above, from a database, such as the database, 150, of FIG. 1,above, from other, like, sources, and any combination thereof.

Collection of session replays at S220 may further include identificationof webpage usage metrics. Webpage usage metrics are numerical indicatorsquantifying the various webpage interactions included in the sessionreplay or replays collected. Webpage usage metrics may be identified byanalysis of interaction events included in one or more session replayssuch as, as examples and without limitation, clicks on a button, hoversover an image, scrolls down a page, and the like.

Further, webpage usage metrics may be analyzed, followingidentification, to generate aggregate webpage usage metrics. Aggregatewebpage usage metrics describe the overall interactions of multipleusers, across multiple sessions, with the webpage. Aggregate webpageusage metrics may include statistics such as, as examples and withoutlimitation, click rates, describing the percentage of site visitorsclicking on a given element of a webpage, bounce rates, describing thenumber of users navigating to a page and leaving the page in a timebelow a predefined threshold, average time spent browsing a given pageacross all visitors, and the like, as well as any combination thereof.As an example, where multiple session replays include “click”interaction events, wherein users click on a given webpage feature, eachclick in the individual session replays may be recorded as contributingto the individual sessions' click metrics, while a click rate may bedetermined by analysis of the click metrics of each session replaycollected.

At S230, website main states are identified. Website main states arestates of webpages reflecting the pages' pre-interaction structure andcontents. Website main states include the various elements andstructures of a webpage, as described by the page's underlying hypertextmarkup language (HTML) codebase, the page's document object model (DOM),such as is described with respect to FIG. 5, below, or the like, as wellas any combination thereof. Identification of website main states atS230 provides for subsequent zone identification, and other, like,analyses, of one or more webpages in a fully-rendered state without useradjustments or inputs. Generation of website main states is described indetail with respect to FIG. 4, below. In addition, identification ofwebsite main states may include generation of one or more main stateidentification requests, as may be received at S410 of FIG. 4, below.Website main states may be data features such as, as examples andwithout limitation, full webpages, modified or unmodified HTML or othercode sets, modified or unmodified DOMs, still images, other, like,features, and any combination thereof. Identified website main states,and various user interactions therewith, where the various userinteractions may be included in the one or more session replayscollected at S220, may be subsequently analyzed to identify one or morezones, such as at S250, as well as aggregate user interaction metrics,as may be presented as described with respect to FIG. 7, below.

At S240, webpage snapshots are selected based on the determined state.Webpage snapshots are single-instant recordings of the state of awebpage, reflecting the page's structure and contents at the moment ofcapture. Webpage snapshots may be selected by collection of usersnapshot selection input, such as through a snapshot selector, asdescribed with respect to FIG. 6B, below. Webpage snapshots may bewebsite main states, such as those identified at S230, or modifiedversions thereof.

At S250, webpage zones are identified. Webpage zones are the variouscontent zones, elements, and the like, included in the snapshot orsnapshots selected at S240. Webpage zones may be identified by variousmethods including, without limitation, analysis of webpage HTML or othercode, DOM analysis, other, like, analyses, and any combination thereof.Zones may be recorded by means including, without limitation, additionof various data tags, or other, like, features, to the selected snapshotor snapshots, generation of separate zone recording files, such as listsor tables, other, like, means, and any combination thereof. As anexample, where a zone is identified according to a method describedhereinbelow, the zone may be recorded by appending the selectedsnapshot's underlying HTML codebase with comments or other data featuresdescribing the element identified, the contents of the element, such asa picture or other data feature, other, like, information, and anycombination thereof.

Webpage zones may be identified by analysis of webpage HTML, and thelike, and analysis of webpage DOMs by application of various techniquesincluding, without limitation, application of machine learning tools,and the like, as well as various combinations thereof. Where a machinelearning tool is applied to the identification of zones in a webpage DOMor HTML, the machine learning tool may be configured to correlate one ormore HTML or DOM features with various pre-defined webpage zone types.

A machine learning tool, such as may be applicable to the identificationof webpage zones, may be configured to execute a zone-identificationmethod based on one or more machine learning trainings. A machinelearning training may be a supervised or unsupervised machine learningtraining, wherein a machine learning tool may be configured to identifyzones, and to automatically improve the accuracy of suchidentifications, based on an administrative response to a zoneidentification performed on a known data set. Where the zoneidentification performed on a known dataset produces an identifiedwebpage corresponding with an administrator's understanding of thepage's zone identities, the machine learning tool may be described as“trained,” and may be applied to the identification of zones innon-training datasets. Where the zone identification performed on aknown dataset produces an identified webpage with zone identificationsconflicting with an administrator's understanding, mis-identified zonesmay be flagged for revision, and the identification of the known datasetmay be repeated, with methodological changes configured to avoidgenerating the same flagged identifications, until the machine learningtool reaches a state at which it may be described as “trained,” asabove.

An example technique for identification of webpage zones is described indetail in U.S. application Ser. No. 16/915,190, assigned to the commonassignee, the contents of which are hereby incorporated by reference.

At S260, identification results of webpage zones are returned.Identification results may be returned in formats including, withoutlimitation, static visual displays, interactive visual displays, HTML orother code sets modified to describe the identified zones, such as bythe addition of a data tag or other feature, webpage DOMs modified todescribe the identified zones, separate files describing the zonesidentified in corresponding HTML, code, or DOM files, in other, like,formats, and any combination thereof. Returning identification resultsat S260 may include, without limitation, saving one or more datafeatures to a storage, such as the database, 150, of FIG. 1, above,presenting identification information through a user device, such as theuser device, 120, of FIG. 1, above, and the like, as well as anycombination thereof. Where an interactive visual display is returned,such a display may be configured to provide identifications of thevarious webpage zones or elements, specified metrics associated with thevarious zones or elements, other, like, data features, and anycombination thereof. Zoning analysis platforms may be configured toprovide interactive visual displays of webpage zone analysisinformation, and may be understood with respect to FIG. 7, below.

FIG. 3 is an example flowchart 300 depicting a method for generatingsession replays, or like files describing the sequence of one or moresessions, according to an embodiment.

At S310, a session replay generation request is received. A sessionreplay generation request is a request describing one or more sessionsfor replay collection. A session replay generation request may includeone or more data features including, without limitation, URLs of one ormore target webpages, a date range specification, a collection frequencyspecification, other, like, features, and any combination thereof. Asession replay generation request may be generated automatically,including during the execution of S220, above. Further, a session replaygeneration replay request may be generated manually by user entry of oneor more of the described data features into a session replay requesttool, as may be accessible through a web browser or other application,and where such a web browser or application may be installed on a userdevice, such as the user device, 120, of FIG. 1, above. A session replaygeneration request may be received from one or more devices including,without limitation, user devices, such as the user devices, 120, of FIG.1, above, analytic servers, such as the analytic server, 130, of FIG. 1,above, and the like, as well as any combination thereof.

At S320, target webpages are collected. Target webpages are webpagesspecified in the session replay generation request received at S310.Target webpages may be specified as one or more URLs, where each URLcorresponds with a webpage hosted on a web server, such as the webserver, 140, of FIG. 1, above. Target webpages may be collected bydownloading target webpages in formats including, without limitation,whole webpages, HTML or other code types underlying the specifiedwebpages, webpage DOMs, other, like, formats, or any combinationthereof. Further, target webpages may be collected by recording orotherwise transcribing webpage evolution. Webpage evolution may berecorded by, without limitation, downloading webpage versions at anypoint at which the webpage, webpage DOM, webpage HTML, and the like,undergo one or more changes during a session, as well as timestamps,counters describing the number of evolutions in a session, and the like.Webpage evolution recording may further include recording webpageevolutions in real-time, providing second-for-second playback of changesto a webpage, the webpage's DOM or HTML, or the like. Target webpagesmay be collected according to one or more bases including, withoutlimitation, scheduled collection, such as collection according to aschedule defined in a session replay generation request received atS310, event-triggered collection, such as collection upon generation ofa new DOM for the same webpage, continuous collection, such as over thespan of a session, according to other, like, bases, and any combinationthereof.

As an example, webpage collection at S320 may include collection of thewebpage upon a DOM update. Where collection occurs upon a DOM update, afirst webpage version may be collected at the time of user connection, asecond version may be collected at the time at which a user hovers overan expanding menu, altering the webpage's DOM to include a pop-out menu,and a third version may be collected at the time at which a user entersan email address into a “newsletter” field, altering the contents of thefield element and, thus, the webpage DOM.

At S330, session events are collected. Session events are interactionsbetween a user and a webpage, occurring during the course of a session.Session events may be, as examples and without limitation, mouse clickson a webpage element, keystrokes, scrolls up or down a page, other,like, events, and any combination thereof. Session events may becollected by one or more means including, without limitation, through abrowser extension, included in an web browser installed on a userdevice, such as the user device, 120, of FIG. 1, above, through anapplication installed on a user device, such as the user device, 120, ofFIG. 1, above, through a tracking tag, token, or other webpage elementincluded in a webpage hosted on a web server, such as the web server,140, of FIG. 1, above, by other, like, means, and any combinationthereof. Where session events are collected through a tracking tag,token, or element included in a webpage, the tracking tag, token, orelement may be configured to be invisible and non-interactive from theperspective of a webpage visitor, and may be configured to record someor all of a user's in-session webpage interactions. Session events maybe collected on a variety of bases including, without limitation,collection of event batches on a predefined schedule, such as weekly, asmay be specified in the session replay generation request received atS310, above, filtered event collection, such as collection only ofevents including a specific webpage element, continuous collection,other, like, bases. Collection of session events at S330 may furtherinclude collection of event types, event targets, event times,describing the point during the session at which the even occurred,other, like, event-related data, and any combination thereof. Sessionevents may be collected in formats including, without limitation, lists,tables, histograms, machine-interpretable formats such ascomma-separated values (CSV), other, like, formats, and any combinationthereof.

It may be understood that S330 may be executed at any point after theexecution of

S310 and before the execution of S340, including simultaneously withS320, without loss of generality or departure from the scope of thedisclosure.

At S340, collected webpages and events are stored. Collected events andwebpages may be stored to one or more storage devices including, withoutlimitation, memory and storage components of an analytic server, such asthe analytic server, 130, of FIG. 1, above, various databases, such asthe database, 150, of FIG. 1, above, other, like, storage devices, andany combination thereof. Collected webpages and events may be storedseparately or as a combined replay. Where collected webpages and eventsare stored separately, webpages and events may be ordered according tothe various timestamps associated therewith, as well as the times ofcollection, providing for storage of separate event and webpagerecordings corresponding to a common timeline, where the common timelinereflects a visitor's session or any portion thereof. Where collectedwebpages and events are stored as a combined replay, the combined replaymay be configured to include a common timeline, describing the course ofa session or portion thereof, to which timeline the collected webpages,webpage evolutions, and events may be mapped to provide a single replayof a visitor's session. Where a combined replay is stored at S340, thereplay may be stored in a variety of formats including, withoutlimitation, as a replay video, demonstrating the course of the sessionfrom the user's perspective, as an enriched video, including the samedemonstration from the user's perspective as well as data featuresdescribing events and webpage evolutions on a second-for-second basis,timeline-correlated webpage, webpage evolution, and events lists, other,like, formats, and any combination thereof.

FIG. 4 is an example flowchart 400 depicting a method for identifyingwebsite main states, according to an embodiment.

At S410, a main state identification request is received. A main stateidentification request is a request describing one or more sessionreplays for main state identification where, as above, a main state isthe state of a webpage reflecting the page's pre-interaction structureand contents. A main state identification request may be generatedautomatically, such as at the execution of S230 of FIG. 1, above, aswell as by other, like means. A main state identification request mayinclude data specifying one or more session replays for analysis. In anembodiment, a main state identification request may include certainfiltering or override parameters, where such parameters may besubsequently applied to adjust the processes applied to identify mainstates, including by allowing for identification of main states afteractive events, as described hereinbelow. A main state identificationrequest may be received from, without limitation, an analytic server,such as the analytic server, 130, of FIG. 1, above.

At S420, payloads are collected. Payloads may include, withoutlimitation, full session replays, components of session replays such aswebpages, webpage HTML codebases, webpage DOMs, various in-sessionevolutions thereof, session event timelines, and the like, as well asany combination thereof. Payloads may be collected from one or moresources including, without limitation, memory or storage components ofan analytic server, such as the analytic server, 130, of FIG. 1, above,databases, such as the database, 150, of FIG. 1, above, from other,like, sources, and any combination thereof.

At S430, visual states are computed. Visual states describe the featuresof a webpage observable to the user, describing the structure andcontents of the visible aspects of the webpage. Visual states may becomputed, based on the payloads collected at S420, after every visiblechange in a webpage during a session, where such visible changes may berepresented as DOM changes, with re-computed visual states reflectingthe most recent configuration of the webpage. Visual states may bestored, permanently, semi-permanently, or temporarily, to a storage orrepository such as those storage and memory components described herein.Visual states may be stored as complete webpages, as webpage HTML orother code, as DOMs, and the like, and any combination thereof. Storedvisual states may be appended or otherwise associated with data featuresdescribing the point in the session at which a given visual state wascomputed, the rank or order of the given visual state out of all visualstates computed for the session, other, like, visual state data, and anycombination thereof.

At S440, classifiers are collected. Classifiers are data featuresdescribing content zones or elements as “active” or “inactive.”Particularly, a classifier may include an event (or mutation) on an HTMLDOM allowing for classification of a target path as active for theevent. An “active” content zone or element is a content zone or elementwith which user interaction generates a visual change to a webpage,thereby altering the DOM of the visible version of the webpage. Examplesof user interactions which may be applicable to such alterations of theDOM of the visible version of a webpage include, without limitation,clicks, mouse hovers, scrolls up or down a webpage, other, like,interactions, and any combination thereof. An example of an “active”content zone or element is an expanding menu button which, when clicked,expands into a menu occupying the top half of the webpage, therebyaltering the DOM. An “inactive” content zone or element is a contentzone or element with which user interaction does not produce a visiblechange to a webpage. An example of an “inactive” element is a productimage on a retail website, where no visible change is generated by auser's click on the product image. Classifiers may be collected from oneor more sources including, without limitation, classifier repositories,dictionaries, and the like, as well as any combination thereof which maybe collected from one or more sources including, without limitation, theweb server, 140, of FIG. 1, above, the database, 150, of FIG. 1, above,the analytic server, 130, of FIG. 1, above, and the like, as well as anycombination thereof. Classifiers may be generated automatically, such asby analysis of webpage HTML to detect elements with which an interactiongenerates a DOM change. Further, classifiers may be manually defined,including by manual review of webpage elements and attribution ofvarious classifiers to each reviewed element. In an embodiment,classifiers may be generated by means including, without limitation,application of machine learning, artificial intelligence, and other,like, techniques, as well as any combination thereof.

At S450, targets are identified. Targets are webpage zones or elementswith which, during the course of a session described by a feature of apayload collected at S420, a user is recorded interacting once or more.Targets may be identified by analysis of events included in the payloadscollected at S420. Targets may include, as examples and withoutlimitation, buttons, fields, and other, like, webpage zones or elements.Identified targets may be recorded by generating one or more registersdescribing the various events, zones or elements, event times, eventorders or ranks, and other, like, data, as collected at S420. Targetidentification registers may be stored or archived in one or moretemporary, permanent, or semi-permanent storage media, such as thosedescribed herein.

At S460, main states are computed. Main states may be computed as thevisual states immediately preceding the first interaction event with atarget having an “active” classifier. Where event timelines collected atS420 correlate with visual states computed at S430, describing both thecontents of the session and the site visitor's view of the webpageduring the session, a main state may be identified. A main state may beidentified by correlating the event timelines collected at S420 with thetargets identified at S450 and applying the classifiers collected atS440 to the same collected targets. Where a timeline of events includesa first triggering interaction with an “active target,” where atriggering interaction is an interaction generating a DOM-alteringwebpage change, a main state may be identified as the latest-computedvisual state which was computed prior to the first triggeringinteraction.

As an example, a loading website may undergo several DOM changes ascontent elements are downloaded from a web server and presented to asite visitor. At each DOM change, a new visual state may be computed,and prior visual states may be cached or stored as describedhereinabove. In the same example, after the webpage has loaded, a sitevisitor may click first on a product image, generating no DOM change,and, subsequently, on an expanding menu button, generating a DOM change.In the example, the site visitor's click on the menu button is atriggering event, while the user's click on the product image is not. Asa result, in the same example, the main state is computed as themost-recent visual state computed prior to the visitor's click on themenu button.

At S470, main states are returned. Main states may be returned ascomplete webpages, as webpage HTML or other, like, codebases, as webpageDOMs, and the like, as well as any combination thereof.

FIG. 5 is an example diagram depicting an unlabeled document objectmodel (DOM) tree 500, according to an embodiment. The unlabeled DOM tree500 provides a visual representation of the hierarchical structure of awebpage's HTML code, with content zones or elements represented asnodes, 510-1 through 510-6 (hereinafter, “nodes” 510). In the exampleunlabeled DOM tree 500, related nodes 510 are joined by “links” 520,representing the relationships between two nodes 510. In the exampleunlabeled DOM tree 500, links 520 are established between nodes 510-1and 510-2 and between nodes 510-3 and 510-1.

In the example unlabeled DOM tree 500, nodes 510-3 and 510-2 aredisposed on a second tier below the first tier occupied by node 510-1,reflecting a structure wherein the content element or zone representedby node 510-1 includes the content elements or zones represented bynodes 510-2 and 510-3. Although only the link 520 between nodes 510-1and 510-2 is labeled, this label is provided for simplicity, and other,like, links 520 may be likewise labeled without loss of generality ordeparture from the scope of the disclosure.

FIG. 6A is an illustration 600 depicting a retroactive zoning analysisrequest tool, according to an embodiment. A retroactive zoning analysistool, such as that depicted in the illustration 600, may be configuredto accept user input and generate a retroactive zoning analysis requesttherefrom. Such a tool may be included as a feature of a webpage,accessible through a web browser, or as an application, where a such aweb browser, such an application, or both, may be installed on a userdevice, such as the user device, 120, of FIG. 1, above.

The retroactive zoning analysis request tool depicted in theillustration 600 includes a mode selector 610, with selections for“current version” mode 620 and “older version mode” 630, as well as adevice view selector 640. The tool may be configured to, when the“current version” mode 620 selection is selected, generate a retroactivezoning analysis request including a specification requesting analysis ofa current or live webpage. Such a “current mode” request may beconfigured to initiate the execution of various processes describedhereinabove with respect to a live version of a given webpage. Where the“older version” mode 630 is selected, the tool may be configured toaccept one or more user inputs describing one or more URLs for analysisaccording to the methods described hereinabove.

Further, the tool may be configured to generate a retroactive zoninganalysis request specifying a given device view mode. As a webpage maybe rendered differently for presentation via different devices, such asdesktop computers, smart phones, and tablet computers, analysis of thevarious renderings may result in different outputs. Where a device viewmode is selected via the device view selector 640, the tool may beconfigured to generate a retroactive zoning analysis request specifyinganalysis of one or more webpages formatted for display on the selecteddevice or devices.

FIG. 6B is an illustration 650 depicting a snapshot selector, accordingto an embodiment. A snapshot selector may be applicable to the selectionof one or more snapshots during a snapshot selection process, such as atS240, above. The snapshot selector 650 may be provided as a feature of awebsite or webpage, accessible through a web browser, as an applicationor a feature of an application, or the like, and any combinationthereof, where both a web browser and such an application may beinstalled on a user device, such as the user device, 120, of FIG. 1,above, either separately or in combination.

The snapshot selector included in the illustration 650 includes a siteselector 660, a date selector 670, one or more snapshots 680, and asnapshot sort tool 690. The site selector 660 may be configured toprovide for the selection of a website or webpage of interest, includingfrom a list of websites or webpages for which snapshots are available.The site selector 660 may be configured to be a drop-down list, anopen-ended text entry field, and the like. The date selector 670 may beconfigured to provide for the selection of a date or range of dates forwhich snapshots are available. The snapshot sort tool 690 may beconfigured to provide for sorting snapshots, within the specifiedwebpage or site and the specified date or date range, based on factorsincluding, without limitation, the snapshots' timestamps, as well asother, like, factors.

The snapshots 680 may include one or more webpage snapshots, collectedas described hereinabove, for the specified webpage or site and thespecified date or date range. The snapshots 680 may be sorted using thesnapshot sort tool 690. The snapshots 680 may include one or moredescriptive data features including, without limitation, snapshot URL,snapshot date, snapshot time, other, like, factors, and any combinationthereof.

It may be understood that, while one snapshot 680 in the illustration650 is labeled for simplicity, other, like, snapshots may be likewiselabeled without loss of generality or departure from the scope of thedisclosure.

FIG. 7 is an illustration 700 of a zoning analysis presentationplatform, according to an embodiment. A zoning analysis presentationplatform, such as that included in the illustration 700, may beconfigured to provide visual outputs and reports, as provided accordingto the methods described hereinabove, as well as to accept user input,where accepted user input may be applicable to configuration of theplatform for various views, reports, and outputs. A zoning analysispresentation platform may be provided as a feature of a website orwebpage, accessible through a web browser, as an application or featureof an application, and the like, as well as any combination thereof,where a web browser and an application may be installed on a userdevice, such as the user device, 120, of FIG. 1, above, eitherseparately or in combination.

The zoning analysis presentation platform depicted in the illustration700 includes a viewing pane 710, one or more zone analysis overlays 720,a device view mode selector 730, a date range selector 740, a conditionsselector 750, a metric selector 760, a URL selector 770, and a versiondate selector 780. The viewing pane 710 may be configured to present aversion of a selected webpage, based on the various selectors, includingzone analysis overlays 720. Zone analysis overlays 720 provide per-zonereportings of the various zone identities and metrics identified andcollected as described hereinabove. Zone analysis overlays 720 may beconfigured to provide aggregate zone metric values including, withoutlimitation, averages, medians, and the like, as well as any combinationthereof, as described with respect to S220, above. Although only onezone analysis overlay 720 is labeled for purposes of simplicity, it maybe understood that some or all zone analysis overlays 720 may be solabeled without loss of generality or departure from the scope of thedisclosure.

The device view mode selector 730 may be configured to provide for theselection of a device view mode, in which view mode the selected webpageor site is presented in the viewing pane 710. Examples of device viewmodes include desktop mode, tablet computer mode, mobile phone mode, andthe like. The date range selector 740 may be configured to provide forthe selection of various date ranges. Where a date range is selectedthrough the date range selector 740, the data included in thecalculation of aggregate metric values presented in the zone analysisoverlays 720 may be limited to only those metric values collected duringthe specified date range. The conditions selector 750 may be configuredto restrict the metric values used in the calculation of aggregatemetric values, where aggregate metric values are presented in the zoneanalysis overlays 720, to only those metric values matching the selectedconditions. Example conditions which may be applicable to restrictmetric values include, without limitation, values collected fromsessions with lengths longer than one hour, values collected fromsessions including specific internet protocol (IP) addresses, other,like, conditions, and any combination thereof.

The metric selector 760 may be configured to provide for selection ofone or more metrics of interest, where the selected metrics of interestare presented through the zone analysis overlays 720. The metricselector 760 may be configured as, as examples and without limitation, a“search” field, a drop-down list, a multiple-choice selector, and thelike. The URL selector 770 may be configured to provide for selection ofa webpage or site for presentation through the viewing pane 710, as wellas analysis as described hereinabove. The URL selector 770 may beconfigured as a drop-down list, an open-ended text-entry field, or thelike. The version date selector 780 provides for selection of a versionof the webpage or site selected via the URL selector 770, at a specifieddate, for presentation through the viewing pane 710 and analysis asdescribed hereinabove. In an embodiment, the version date selector 780may be configured to open or launch a snapshot or version selector, suchas that described with respect to FIG. 6B, above, to provide forselection of a preferred site snapshot or version where multiplesnapshots or versions are included for the specified date and URL.

FIG. 8 is an example schematic diagram of an analytic server 130,according to an embodiment. The analytic server 130 includes aprocessing circuitry 810 coupled to a memory 820, a storage 830, and anetwork interface 840. In an embodiment, the components of the server130 may be communicatively connected via a bus 850.

The processing circuitry 810 may be realized as one or more hardwarelogic components and circuits. For example, and without limitation,illustrative types of hardware logic components that can be used includefield programmable gate arrays (FPGAs), application-specific integratedcircuits (ASICs), Application-specific standard products (ASSPs),system-on-a-chip systems (SOCs), graphics processing units (GPUs),tensor processing units (TPUs), general-purpose microprocessors,microcontrollers, digital signal processors (DSPs), and the like, or anyother hardware logic components that can perform calculations or othermanipulations of information.

The memory 820 may be volatile (e.g., random access memory, etc.),non-volatile (e.g., read only memory, flash memory, etc.), or acombination thereof.

In one configuration, software for implementing one or more embodimentsdisclosed herein may be stored in the storage 830. In anotherconfiguration, the memory 820 is configured to store such software.Software shall be construed broadly to mean any type of instructions,whether referred to as software, firmware, middleware, microcode,hardware description language, or otherwise. Instructions may includecode (e.g., in source code format, binary code format, executable codeformat, or any other suitable format of code). The instructions, whenexecuted by the processing circuitry 810, cause the processing circuitry810 to perform the various processes described herein.

The storage 830 may be magnetic storage, optical storage, and the like,and may be realized, for example, as flash memory or another memorytechnology, compact disk-read only memory (CD-ROM), Digital VersatileDisks (DVDs), or any other medium which can be used to store the desiredinformation.

The network interface 840 allows the analytic server 130 to communicatewith the various components, devices, and systems described herein forcollection of a website in a past state and retroactive analysisthereof, as well as other, like, purposes.

It should be understood that the embodiments described herein are notlimited to the specific architecture illustrated in FIG. 8, and otherarchitectures may be equally used without departing from the scope ofthe disclosed embodiments.

The various embodiments disclosed herein can be implemented as hardware,firmware, software, or any combination thereof. Moreover, the softwareis preferably implemented as an application program tangibly embodied ona program storage unit or computer readable medium consisting of parts,or of certain devices and/or a combination of devices. The applicationprogram may be uploaded to, and executed by, a machine comprising anysuitable architecture. Preferably, the machine is implemented on acomputer platform having hardware such as one or more central processingunits (“CPUs”), a memory, and input/output interfaces. The computerplatform may also include an operating system and microinstruction code.The various processes and functions described herein may be either partof the microinstruction code or part of the application program, or anycombination thereof, which may be executed by a CPU, whether or not sucha computer or processor is explicitly shown. In addition, various otherperipheral units may be connected to the computer platform such as anadditional data storage unit and a printing unit. Further, anon-transitory computer readable medium is any computer readable mediumexcept for a transitory propagating signal.

It should be understood that any reference to an element herein using adesignation such as “first,” “second,” and so forth does not generallylimit the quantity or order of those elements. Rather, thesedesignations are generally used herein as a convenient method ofdistinguishing between two or more elements or instances of an element.Thus, a reference to first and second elements does not mean that onlytwo elements may be employed there or that the first element mustprecede the second element in some manner. Also, unless statedotherwise, a set of elements comprises one or more elements.

As used herein, the phrase “at least one of” followed by a listing ofitems means that any of the listed items can be utilized individually,or any combination of two or more of the listed items can be utilized.For example, if a system is described as including “at least one of A,B, and C,” the system can include A alone; B alone; C alone; 2A; 2B; 2C;3A; A and B in combination; B and C in combination; A and C incombination; A, B, and C in combination; 2A and C in combination; A, 3B,and 2C in combination; and the like.

All examples and conditional language recited herein are intended forpedagogical purposes to aid the reader in understanding the principlesof the disclosed embodiments and the concepts contributed by theinventor to furthering the art, and are to be construed as being withoutlimitation to such specifically recited examples and conditions.Moreover, all statements herein reciting principles, aspects, andembodiments of the disclosed embodiments, as well as specific examplesthereof, are intended to encompass both structural and functionalequivalents thereof. Additionally, it is intended that such equivalentsinclude both currently known equivalents as well as equivalentsdeveloped in the future, i.e., any elements developed that perform thesame function, regardless of structure.

1. A method for collection of a website in a past state and retroactiveanalysis thereof, comprising: collecting, from a repository, at leastone session replay; identifying, in the at least one collected sessionreplay, at least one main state, wherein a main state is a portion of asession replay; receiving a selection from a user of a webpage snapshotfrom a plurality of webpage snapshots, the selected webpage snapshotcorresponding to a respective main state of the at least one identifiedmain state, wherein each snapshot is a single-instant recording of awebpage state at a specific point in time, the snapshot comprisingdescriptive data features that reflect webpage structure and webpagecontent at the specific point in time; identifying, in the at leastselected one snapshot, at least one webpage zone; and returning the atleast one identified zone.
 2. The method of claim 1, wherein the mainstate comprises pre-interaction webpage structure and webpage content.3. The method of claim 1, wherein at least one collected session replayincludes a description of a webpage visitor's interactions with awebpage.
 4. The method of claim 3, wherein the at least one collectedsession replay further includes: descriptions of visitors' interactionswith an auto-anonymized webpage.
 5. The method of claim 1, whereinidentifying at least one main state further comprises: collecting apayload; computing, from the collected a payload, at least one visualstate; collecting a plurality of classifiers; identifying a plurality ofzones of the at least one computed visual state; computing a main state;and returning the computed main state.
 6. The od of claim 5, wherein theat least one computed visual state is a single-instant description of awebsite's visual state from the perspective of a site visitor.
 7. Themethod of claim 5, wherein the collected payload is a session replay. 8.The method of claim 5, further comprising: generating the plurality ofclassifiers using a machine learning model.
 9. The method of claim 8,wherein each of the plurality of collected classifiers is any one of:active and inactive.
 10. The method of claim 1, further comprising:generating the at least one session replay; and storing the at least onegenerated session replay in the repository.
 11. The method of claim 10,wherein generating the at least one session reply further comprises:collecting a target webpage; collecting a plurality of session events,wherein a session event is an interaction by a user with one or morezones or elements of the target webpage; and storing the collectedtarget webpage and the plurality of session events.
 12. A non-transitorycomputer readable medium having stored thereon instructions for causinga processing circuitry to execute a process for collection of a websitein a past state and retroactive analysis thereof, the processcomprising: collecting, from a repository, at least one session replay;identifying, in the at least one collected session replay, at least onemain state, wherein a main state is a portion of a session replay;receiving a selection from a user of a webpage snapshot from a pluralityof webpage snapshots, the selected webpage snapshot corresponding to arespective main state of the at least one identified main state, whereineach snapshot is a single-instant recording of a webpage state at aspecific point in time, the snapshot comprising descriptive datafeatures that reflect webpage structure and webpage content at thespecific point in time: identifying, in the at least selected onesnapshot, at least one webpage zone; and returning the at least oneidentified zone.
 13. A system for collection of a website in a paststate and retroactive analysis thereof, comprising: a processingcircuitry; and a memory, the memory containing instructions that, whenexecuted by the processing circuitry, configure the system to performoperations comprising: collecting, from a repository, at least onesession replay; identifying, in the at least one collected sessionreplay, at least one main state, wherein a main state is a portion of asession replay; receiving a selection from a user of a webpage snapshotfrom a plurality of webpage snapshots, the selected webpage snapshotcorresponding to a respective main state of the at least one identifiedmain state, wherein each snapshot is a single-instant recording of awebpage state at a specific point in time, the snapshot comprisingdescriptive data features that reflect webpage structure and webpagecontent at the specific point in time; identifying, in the at leastselected one snapshot, at least one webpage zone; and returning the atleast one identified zone.
 14. The system of claim 13, wherein the mainstate comprises pre-interaction webpage structure and webpage content.15. The system of claim 12, wherein at least one collected sessionreplay includes a description of a webpage visitor's interactions with awebpage,
 16. The system of claim 15, wherein the at least one collectedsession replay further includes: descriptions of visitors' interactionswith an auto-anonymized webpage.
 17. The system of claim 13, wherein theoperations further comprise: collecting a payload; computing, from thecollected a payload, at least one visual state; collecting a pluralityof classifiers; identifying a plurality of zones of the at least onecomputed visual state; computing a main state; and returning thecomputed main state.
 18. The system of claim 17, wherein the at leastone computed visual state is a single-instant description of a website'svisual state from the perspective of a site visitor.
 19. The system ofclaim 17, wherein the collected payload is a session replay.
 20. Thesystem of claim 17, wherein the operations further comprise: generatingthe plurality of classifiers using a machine learning model.
 21. Thesystem of claim 20, wherein each of the plurality of collectedclassifiers is any one of: active and inactive.
 22. The system of claim13, wherein the operations further comprise: generating the at least onesession replay; and storing the at least one generated session replay inthe repository.
 23. The system of claim 22, wherein the operationsfurther comprise: collecting a target webpage; collecting a plurality ofsession events, wherein a session event is an interaction by a user withone or more zones or elements of the target webpage; and storing thecollected target webpage and the plurality of session events.